As we all know, with such an increase in crime rate, it is the need of the hour to analyze the available data and try to come up with some necessary and efficient solutions to tackle this grim situation. This can help us to predict and analyze the nature of the criminal activities. The technique of data analysis and predictions is widely used in nearly all sectors such as business, healthcare and so on, but in the field of crime, we believe that it has still not been used to its fullest or in the way it should have been. Since, it is vital to understand these patterns of criminal activities to counter them effectively, we are going to perform data analysis on the crime data.
We are going to analyze the data of Baltimore city, scraped from a publicly available website of the Baltimore Police Department. The overall crime rate in Baltimore is 153% higher than the national average. For every 100,000 people, there are 19.06 daily crimes that occur in Baltimore. Baltimore has been in the list of the top dangerous cities of the United States. So, we are planning to come up with some efficient results or predictions using which the police can change or update their ways of operation depending on the location, time and many other factors. This will help the police to tackle various situations in a more efficient way.
Earlier, we decided to analyze crime data for the city of Boston but later changed our focus from Boston Police Department to Baltimore Police Department. Also, we thought that as we live near Baltimore, it would be better to analyze the data of our surroundings and get more insight into the criminal activities around us and hopefully be able to come up with some useful results.
The dataset provides the details about the date of crime, the time when the incident occurred, place where it occurred, whether it was inside or outside, whether any weapons were used, and some description, etc. We will be analyzing criminal activities based on various fields like location, time, date, age, sex, race, description and predicting criminal behavior. We are planning to perform data visualization with the help of different statistical models, and also are planning to plot maps. These maps will help us to locate, sort and analyze the different areas and compare them based on the factors mentioned in the analysis part.
#importing data set
import pandas as pd
pd.set_option('display.max_columns', None)
df=pd.read_csv('BPD_Arrest_Based_Crime_Data.csv')
df.head()
#Renaming columns
df.rename(columns={'CrimeDate':'Date', 'CrimeTime':'Time', 'CrimeCode':'Code', 'vri_name1':'VRI', 'Inside/Outside':'Inside_Outside', 'Age':'Offender_Age', 'Sex':'Offender_Sex','Race':'Offender_Race'}, inplace=True)
#Dropping columns
df=df.drop(columns=['Location 1','Total Incidents'])
#Rearranging columns
df=df[['Arrest_ID', 'Date', 'Time', 'Code', 'Description', 'Offender_Age', 'Offender_Sex', 'Offender_Race', 'Weapon', 'Location', 'Post', 'District', 'Neighborhood', 'Inside_Outside', 'Premise', 'Latitude', 'Longitude', 'VRI']]
#Madifying data types
df['Date']=pd.to_datetime(df['Date'])
df['Time']=pd.to_datetime(df['Time'])
df['Time']=pd.to_timedelta(df['Time'].dt.strftime('%H:%M:%S'))
df[['Latitude', 'Longitude', 'Offender_Age']]=df[['Latitude', 'Longitude', 'Offender_Age']].apply(pd.to_numeric)
df[['Arrest_ID']]=df[df['Arrest_ID'].isnull()==False]['Arrest_ID'].astype(int).astype(str)
df[['Post']]=df[['Post']].astype('object')
df.dtypes
#Handling null values
df.Arrest_ID.fillna('NA', inplace=True)
df.Time.fillna(0, inplace=True)
df.Offender_Age.fillna('NA', inplace=True)
df.Weapon.fillna('NA', inplace=True)
df.Location.fillna('NA', inplace=True)
df.Post.fillna('NA', inplace=True)
df.Neighborhood.fillna('NA', inplace=True)
df.Inside_Outside.fillna('NA', inplace=True)
df.Premise.fillna('NA', inplace=True)
df.Latitude.fillna('NA', inplace=True)
df.Longitude.fillna('NA', inplace=True)
import numpy as np
df.Latitude = np.where(df.Longitude.eq('NA'), 'NA', df.Latitude)
df.Longitude = np.where(df.Latitude.eq('NA'), 'NA', df.Longitude)
df.VRI.fillna('NA', inplace=True)
#Renaming values of 'Description' column
df['Description'].replace(['AGG. ASSAULT', 'COMMON ASSAULT'],['ASSAULT - AGGRAVATED', 'ASSAULT - COMMON'], inplace=True)
df['Description'].unique()
#Creating dummies for each type of criminal activity
crimeDesc=list(df['Description'].unique())
crimeDesc=[d.split(' ', 1)[0] if d!='AUTO THEFT' else d for d in crimeDesc]
crimeDesc=list(dict.fromkeys(crimeDesc))
df=pd.concat([df, df.Description.str.findall('|'.join(crimeDesc)).str[0].str.get_dummies()], axis=1)
df.head()
#importing pyplot
import matplotlib.pyplot as plt
%matplotlib inline
import calendar
plt.figure(figsize = (12,8))
(df['Date']
.groupby(df.Date.dt.month)
.agg('count')
.plot.bar(color='black')
)
plt.xlabel('Month', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
plt.title('Arrests across Months', fontsize = 22)
plt.figure(figsize=(12,8))
plt.xlabel('Age', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
df[df['Offender_Age']!='NA'].Offender_Age.hist(bins=50, color= 'black')
plt.figure(figsize = (12,8))
df['District'].value_counts().plot.bar(stacked=True, color='black')
plt.xlabel('District', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
plt.title('Distribution of Arrests across various Districts', fontsize=20)
(df.pivot_table('Arrest_ID', index='Offender_Age', columns='District', aggfunc = 'count', fill_value=0)
.plot(kind='bar', stacked = True, figsize=(18,10)))
plt.xlabel('Age of the Offender', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
plt.title('Arrests across District and Age', fontsize = 22)
#Pivoting the table
df_new = df[df.Offender_Age!='NA'].pivot_table('Arrest_ID', index='Offender_Age', columns='Offender_Race', aggfunc = 'count', fill_value=0)
plt.figure(figsize=(18,10))
plt.plot(df_new['A'], label='Asian')
plt.plot(df_new['B'], label='African American')
plt.plot(df_new['U'], label='Unknown')
plt.plot(df_new['H'], label='Hispanic')
plt.plot(df_new['W'], label='White')
plt.legend(loc="upper right", title='Race')
plt.xlabel('Age', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
plt.title('Distribution of Arrests across Race and Age', fontsize = 22)
(df.pivot_table('Arrest_ID', index=df.Date.dt.year, columns='Offender_Sex', aggfunc = 'count', fill_value=0)
.plot(kind='bar',stacked=True, figsize=(18,10)))
plt.title('Distribution of Arrests Year & Sex', fontsize = 22)
plt.xlabel('Year', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
plt.figure(figsize = (12,8))
(df['Date']
.groupby(pd.to_datetime(df.Time).dt.hour)
.agg('count')
.plot.bar(stacked=True, color='#0A2229')
)
plt.title('Distribution of Arrests throughout the Day', fontsize = 22)
plt.xlabel('Hour of arrest', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
hour_period_map = {
0 : '0-3',
1 : '0-3',
2 : '0-3',
3 : '0-3',
4 : '4-7',
5 : '4-7',
6 : '4-7',
7 : '4-7',
8 : '8-11',
9 : '8-11',
10 : '8-11',
11 : '8-11',
12 : '12-15',
13 : '12-15',
14 : '12-15',
15 : '12-15',
16 : '16-19',
17 : '16-19',
18 : '16-19',
19 : '16-19',
20 : '20-23',
21 : '20-23',
22 : '20-23',
23 : '20-23',
}
df['period_in_day'] = pd.to_datetime(df.Time).dt.hour.map(lambda time : hour_period_map[time])
df['weekday'] = df['Date'].dt.weekday_name
(df.pivot_table('Arrest_ID', index='weekday', columns='period_in_day', aggfunc = 'count', fill_value=0)
.plot(kind='bar',stacked=True, figsize=(18,10)))
plt.xlabel('Day in the week', fontsize=18)
plt.ylabel('No of Arrests', fontsize=18)
plt.title('Distribution of arrests on a particular day in Weekday and Time', fontsize = 22)
#Importing graph objects for plotting maps
import plotly.graph_objects as go
import plotly as py
import datetime
py.offline.init_notebook_mode(connected = True)
#setting access token
mapbox_access_token = 'pk.eyJ1IjoibWFuYXNtaXNocmEwNyIsImEiOiJjazNzcHR0bHAwNTRhM2RteWQ2b2F4ZDBiIn0.xYDD-sP_bsZJB0LXEWcOiA'
#fetching date as string in a specific format
df['Date_Str']=df['Date'].apply(lambda x: datetime.datetime.strftime(x, '%m-%d-%Y'))
#creating dataframes for different types of criminal activities
df_ASSAULT=df[df['ASSAULT']==1]
df_LARCENY=df[df['LARCENY']==1]
df_ROBBERY=df[df['ROBBERY']==1]
df_BURGLARY=df[df['BURGLARY']==1]
df_AUTO_THEFT=df[df['AUTO THEFT']==1]
df_SHOOTING=df[df['SHOOTING']==1]
df_HOMICIDE=df[df['HOMICIDE']==1]
df_ARSON=df[df['ARSON']==1]
df_RAPE=df[df['RAPE']==1]
#creating markers using scattermapbox for different criminal activities
data_ASSAULT=go.Scattermapbox(lon = df_ASSAULT['Longitude'],
lat = df_ASSAULT['Latitude'],
mode='markers',
name='ASSAULT',
marker = dict(color='red', symbol= 'circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_ASSAULT[['Date_Str', 'Location']])
data_LARCENY=go.Scattermapbox(
lon = df_LARCENY['Longitude'],
lat = df_LARCENY['Latitude'],
mode = 'markers',
name = 'LARCENY',
marker = dict(color='orange',
symbol='circle',
opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_LARCENY[['Date_Str', 'Location']])
data_ROBBERY=go.Scattermapbox(lon = df_ROBBERY['Longitude'],
lat = df_ROBBERY['Latitude'],
mode='markers',
name='ROBBERY',
marker = dict(color='yellow', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_ROBBERY[['Date_Str', 'Location']])
data_BURGLARY=go.Scattermapbox(lon = df_BURGLARY['Longitude'],
lat = df_BURGLARY['Latitude'],
mode='markers',
name='BURGLARY',
marker = dict(color='green', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_BURGLARY[['Date_Str', 'Location']])
data_AUTO_THEFT=go.Scattermapbox(lon = df_AUTO_THEFT['Longitude'],
lat = df_AUTO_THEFT['Latitude'],
mode='markers',
name='AUTO THEFT',
marker = dict(color='blue', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_AUTO_THEFT[['Date_Str', 'Location']])
data_HOMICIDE=go.Scattermapbox(lon = df_HOMICIDE['Longitude'],
lat = df_HOMICIDE['Latitude'],
mode='markers',
name='SHOOTING',
marker = dict(color='brown', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_HOMICIDE[['Date_Str', 'Location']])
data_SHOOTING=go.Scattermapbox(lon = df_SHOOTING['Longitude'],
lat = df_SHOOTING['Latitude'],
mode='markers',
name='HOMICIDE',
marker = dict(color='purple', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_SHOOTING[['Date_Str', 'Location']])
data_ARSON=go.Scattermapbox(lon = df_ARSON['Longitude'],
lat = df_ARSON['Latitude'],
mode='markers',
name='ARSON',
marker = dict(color='pink', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_ARSON[['Date_Str', 'Location']])
data_RAPE=go.Scattermapbox(lon = df_RAPE['Longitude'],
lat = df_RAPE['Latitude'],
mode='markers',
name='RAPE',
marker = dict(color='black', symbol='circle', opacity=0.75),
line=dict(color='rgb(200,200,200)'),
text = df_RAPE[['Date_Str', 'Location']])
#creating list of dataframes
data=[data_ASSAULT, data_LARCENY, data_ROBBERY, data_BURGLARY, data_AUTO_THEFT, data_SHOOTING, data_HOMICIDE, data_ARSON, data_RAPE]
#setting the layout of the map
layout = go.Layout(title = 'Crimes in Baltimore city',
autosize = True,
hovermode = 'closest',
showlegend = True,
mapbox = dict(accesstoken = mapbox_access_token,
bearing = 0,
center = dict(lat = 39.2904, lon = -76.6122),
pitch = 0,
zoom = 9.5,
style = 'light'))
fig = dict(data = data, layout = layout)
#plotting map
py.offline.iplot(fig)
#calculating the arrest counts for each crime
count_df=pd.DataFrame([(column, df[column].sum()) for column in crimeDesc], columns=['Crime', 'Count']).sort_values(by='Count', ascending=False)
count_df=count_df.set_index('Crime')
count_df
#visualising shooting crime data in Baltimore
#importing folium to plot map
from folium.plugins import MarkerCluster
import folium
df_n=df[df['Latitude']!='NA']
df_n=df_n[df_n['Latitude']!='NA']
df_n=df_n[df_n['SHOOTING']==1]
def getIO(loc):
if loc=='Inside':
return 'Indoor Shooting'
elif loc=='Outside':
return 'Public Shooting'
else:
return 'Not Available'
#setting up map
b_map=folium.Map(location=[39.2904, -76.6122], zoom_start=13)
mc = MarkerCluster()
#setting up markers
for row in df_n.iterrows():
mc.add_child(folium.Marker(location = [row[1]['Latitude'],row[1]['Longitude']], popup=getIO(row[1]['Inside_Outside'])))
b_map.add_child(mc)
#plotting map
b_map
In this project, we explored various crimes that took place in Baltimore city, Maryland between end of 2016 to mid 2019. We used different types of visualizations like hisotgram, staked bar plot, line graphs and maps to answer questions like:
We found that the most common type of crime reported was Larceny. Males were more involved in a crime when compared to females. The majority of criminals belonged to the age group of 18 to 35 and with increase in age, the number of offenders decrease.
We were successful to in mapping all the aspects of the criminal activities that took place in Baltimore. We were to able to analyze the overall nature of the criminal behavior in Baltimore. This analysis can be used by BPD and general public to avoid any criminal encounters, and decrease the overall crime rate in Baltimore. Finally, with the help of our analysis, BPD can improve their operations in order to tackle various criminal and illegal activities.